193 research outputs found

    On the efficiency of estimating penetrating rank on large graphs

    Get PDF
    P-Rank (Penetrating Rank) has been suggested as a useful measure of structural similarity that takes account of both incoming and outgoing edges in ubiquitous networks. Existing work often utilizes memoization to compute P-Rank similarity in an iterative fashion, which requires cubic time in the worst case. Besides, previous methods mainly focus on the deterministic computation of P-Rank, but lack the probabilistic framework that scales well for large graphs. In this paper, we propose two efficient algorithms for computing P-Rank on large graphs. The first observation is that a large body of objects in a real graph usually share similar neighborhood structures. By merging such objects with an explicit low-rank factorization, we devise a deterministic algorithm to compute P-Rank in quadratic time. The second observation is that by converting the iterative form of P-Rank into a matrix power series form, we can leverage the random sampling approach to probabilistically compute P-Rank in linear time with provable accuracy guarantees. The empirical results on both real and synthetic datasets show that our approaches achieve high time efficiency with controlled error and outperform the baseline algorithms by at least one order of magnitude

    Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

    Full text link
    The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional “bag of words” representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small

    Supply driven mortgage choice

    Get PDF
    Variable mortgage contracts dominate the UK mortgage market (Miles, 2004). The dominance of the variable rate mortgage contracts has important consequences for the transmission mechanism of monetary policy decisions and systemic risks (Khandani et al., 2012; Fuster and Vickery, 2013). This raises an obvious concern that a mortgage market such as that in the UK, where the major proportion of mortgage debt is either at a variable or fixed for less than two years rate (Badarinza, et al., 2013; CML, 2012), is vulnerable to alterations in the interest rate regime. Theoretically, mortgage choice is determined by demand and supply factors. So far, most of the existing literature has focused on the demand side perspective, and what is limited is consideration of supply side factors in empirical investigation on mortgage choice decisions. This paper uniquely explores whether supply side factors may partially explain observed/ex-post mortgage type decisions. Empirical results detect that lenders’ profit motives and mortgage funding/pricing issues may have assisted in preferences toward variable rate contracts. Securitisation is found to positively impact upon gross mortgage lending volumes while negatively impacting upon the share of variable lending flows. This shows that an increase in securitisation not only improves liquidity in the supply of mortgage funds, but also has the potential to shift mortgage choices toward fixed mortgage debt. The policy implications may involve a number of measures, including reconsideration of the capital requirements for the fixed, as opposed to the variable rate mortgage debt, growing securitisation and optimisation of the mortgage pricing policies

    A Knowledge-Based Semantic Kernel for Text Classification

    Full text link
    Abstract. Typically, in textual document classification the documents are represented in the vector space using the “Bag of Words ” (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a seman-tic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both seman-tic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers

    Results of the seventh edition of the BioASQ Challenge

    Full text link
    The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.Comment: 17 pages, 2 figure

    MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing

    Get PDF
    The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (seeNote 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28, 000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing “learning to rank” (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh
    corecore